Weibull-type limiting distribution for replicative systems 
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The Weibull function is widely used to describe skew distributions observed in nature. However, 
the origin of this ubiquity is not always obvious to explain. In the present study, we consider the 
well-known Galton-Watson branching process describing simple replicative systems. The shape of 
the resulting distribution, about which little has been known, is found essentially indistinguishable 
from the Weibull form in a wide range of the branching parameter; this can be seen from the exact 
series expansion for the cumulative distribution, which takes a universal form. We also find that 
the branching process can be mapped into a process of aggregation of clusters. In the branching 
and aggregation process, the number of events considered for branching and aggregation grows 
cumulatively in time, whereas for the binomial distribution, an independent event occurs at each 
time with a given success probability. 
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I. INTRODUCTION 

Various systems in nature exhibit skew distributions, 
which are properly fit to the Weibull distribution [l[ 
as well as lognormal and power-law distributions; rela- 
tions between those skew distributions have been dis- 
cussed recently Q. In particular, the Weibull distribu- 
tion, despite the simple mathematical form, particularly 
for the cumulative distribution F(x) = 1 — exp[— (x/rj)^], 
has flexible shapes depending on the value of j3 and is 
widely used to describe size distributions of, e.g., material 
strengths [l], [I[ , cloud droplets [|| , biological tissues @ , 
ocean wave heights @, and wind speeds Q. However, 
there still lacks an appropriate explanation of its ubiq- 
uitous emergence, in sharp contrast with the Gaussian 
distribution, let aside the case-by-case derivation such 
as material breaking with the weakest element en- 
tropy maximization material fragmentation (8[, and 
extreme value statistics 0, ■ 

It is well known that the binomial distribution results 
from success events for given independent trials with the 
success probability p given. When the success is a rare 
event (i.e., p is small), it reduces to the Poisson distribu- 
tion. According to the central limit theorem [ll|, (dis- 
crete) binomial and Poisson distributions approach the 
(continuous) Gaussian distribution in the limit of large 
trial numbers. In a similar spirit, we here derive a contin- 
uous Weibull-like distribution from the discrete Galton- 
Watson branching process, motivated by cell replication 
in a tissue The branching process can serve as a ba- 
sic model to describe discrete events having two possibil- 
ities, e.g., replication/non- replication or nucleation/non- 
nucleation. The generating function for this distribution 
was first obtained in the seminal work of general branch- 
ing processes [12, [TH . Specifically, asymptotics were de- 
rived in the more general case of multiple replicates and 
extinction processes at each stage of the process, added to 



possible immigration events (see for example Ref. |14j). 
but little is known about the shape of the distribution 
itself relatively to other standard distributions, except 
for few very specific cases where the limiting distribution 
can be computed exactly through the use of a rational 
form for the generating function at the first stage of the 
process and which usually leads to a simple exponen- 
tial function. Here we find that it is approached by the 
Weibull distribution in rather a wide and realistic range 
of the replication parameter p, making the two distribu- 
tions surprisingly indistinguishable in practice. 

This paper consists of four sections and an appendix. 
In Sec. II, cell replication is described in terms of a 
branching process. The stationary distribution of the 
branching process is obtained and its general proper- 
ties are discussed. Results of Monte Carlo simulations 
are also presented. Section III examines the relation be- 
tween the distributions for different replication probabil- 
ities and probe the scaling with the help of an ansatz, 
which is justified from the exact series expansion. Fi- 
nally, Sec. IV discusses and summarizes the results. In 
Appendix, all the moments of the distribution are ob- 
tained analytically from the recurrence relation of the 
generating function. 



II. CELL REPLICATION AND BRANCHING 
PROCESS 

For the binomial distribution, an independent event 
occurs at each time with given success probability. In 
cell replication, on the other hand, the number of repli- 
cation events in consideration depends on the current cell 
number of a tissue. For example, even if there exists just 
a single mother cell initially, it may replicate from time 
to time, and there may occur many replications of the 
mother and daughter cells. Accordingly, we consider the 



2 



I A 

h(l)=q M2)=p 

1 ill! [M] A 

/ 2 (l)= g 2 f 2 (2)=qp + q 2 p / 2 (3)=2g P 2 / 2 (4)=p :i 

FIG. 1: Cell replication graphs for a branching process. Cell 
number configurations at time steps n = 1 and 2 are plotted 
with the replication probability at each step is given by p; 
q = 1 — p corresponds to the probability that the cell does not 
replicate. 

probability distribution f n (l) of tissues with size (i.e., 
the number of cells) I at given time step n, which sat- 
isfies the normalization condition X^=i/n(0 = 1 with 
the maximum possible cell number in the tissue after the 
nth replication given by 2™. Note that this process can 
be described in terms of a branching process with the 
branching probability p, as illustrated in Fig. [T] Each 
graph in the figure, where sites in the nth row represent 
cells at the time step n, corresponds to one possible con- 
figuration of cell growth for the given duration. Each 
graph thus starts from a single site in the first row (i.e., 
a single mother cell initially); sites may replicate or not, 
giving birth to new sites at successive time steps (here 
the time step is fixed to be a constant). 

It is useful to consider the generating function for the 
distribution /„ at time n in the branching process 
El: 

2" 

. 9rl (z)-^/„(/)z z . (1) 
1=1 

For example, the kth moment at time n, defined to be 
J2i=i fn(l)l k , can be computed by differentiating suc- 
cessively the generating function: (z-^) k g n ( z )\z=i with 
ffn(l) = 1 f° r au n ( see Appendix for the derivation of 
all the moments). In the following, for simplicity, we 
will impose f n (l) = for I > 2". At the initial time 
(n = 0) the system contains only one element, leading to 
go(z) = z. Since the distribution f n +i is related to the 
preceding one f n via combinatorial relations, it is easy 
to show that the generating function satisfies the non- 
linear recursion equation, g n (z) = gi(g n -i( z )) for n > 1, 
where gi(z) = qz +pz 2 . This equation provides a recur- 
sive function for the newly generated sites, which are all 
independent, with the generating function gi(z). 

^From this relation, we can deduce that the total num- 
ber N(n) of configurations or graphs at (discrete) time 
n satisfies the recurrence relation N(n+1) = iV(?i)[l + 
N(n)}, with the initial condition iV(0) = 1, and grows 
rapidly in time. Indeed this relation can be obtained eas- 
ily from the observation that N(n) is equal to p n (l) with 



p and q replaced formally by unity. Therefore N (n) satis- 
fies the same relation as g n (l) above. It is also manifested 
from the physical point of view: Given N(n) graphs at 
time n, there are two possible ways to generate graphs 
at time (n+1). (i) In the case of non-replication of the 
original site, we simply have N(n) graphs; (ii) in the case 
of replication of the same site, we can attach to the two 
offsprings a total of N(n) 2 graphs. As a result, we ob- 
tain N(n)+N(n) 2 possible configurations at time (n+1). 
This can be checked in Fig. [T] for the first few graphs: 
JV(0) = 1, 2V(1) = 2, 2V(2) = 6, and so on. 

Because a tissue of size I results from (I— l)-times pro- 
liferation starting from a single cell (see Fig. [IJ , the re- 
currence relation 

g n +i(z) = qg n (z) +pgl{z) (2) 

leads to the recursive relation for the distribution f n (l) 
by simply identifying the coefficients of z l on the left and 
right sides of the last expression: 

fn+l(l) = ?/„(0 /»(*)/»(* ~ *)• (3) 

fc=l 

Namely, a tissue of size / at time n + 1 can be generated 
in the two ways: (i) no replication at the first time step 
followed by producing I descendants at the following n 
time steps and (ii) replication at the first time step fol- 
lowed by producing k descendants from one offspring and 
l—k descendants from the other offspring at the following 
n time steps. 

The size distribution, computed from Eq. ([3]), is ex- 
hibited in Fig. [2j together with that from Monte Carlo 
simulations, manifesting perfect agreement. It is of inter- 
est that Eq. ([3]) can be mapped into a process of random 
aggregation of clusters with the aggregation probability 
p. Using q = 1 — p and J2k=i fn(k) = 1, we thus obtain 

2™ l-l 

A/n(l) = -?^/„(0/nW+P^/nW/n(i " *) (4) 
k=l k=l 

with Af n (l) = f n +i(l) — fn(l)- Therefore a cluster of size 
I can be formed from aggregation of a cluster of size k and 
a cluster of size (I — k) with the aggregation probability 
P- 

Figure [3] shows the normalized size distribution for 
p = 0.3 at several time steps n = 10, 12, and 14. Re- 
markably, when size I is rescaled by the factor (1 +p) n , 
the distributions collapse into a single curve independent 
of n, suggesting the presence of a stationary distribution 
for the branching process [12]. Indeed the average cell 
number in a tissue after the nth replication with the repli- 
cation probability p is given by (1 + p) n = J2i=i Unit)- 
Note that f n (l) may be regarded as a continuous function 
/„ (x) when n is large (see Fig. [3j . Since the average cell 
number after the (n— l)th replication is (1 +p) n ~ 1 1 we 
have the scaling relation 

J dxxf n (x) = (1+p) J dx'x'U^x') 
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FIG. 2: (color online) Comparison of the tissue size distri- 
bution f n (l) at time n = 10, for the replication probability 
p = 0.3, 0.5, 0.8, and 0.9. Analytical (solid lines) and simula- 
tion (+ signs) results agree perfectly, displaying multimodal 
shapes for large values of p. In the Monte Carlo simulations 
of the branching process, starting from a single cell, we let ev- 
ery cell replicate with a given replication probability at each 
Monte Carlo step. Data have been obtained from 10 6 trial 
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FIG. 3: (color online) Weibull distribution of tissue sizes in 
the cell replication process. Cell-number distribution for p = 
0.3 at three different time steps. Distributions versus the 
rescaled size are plotted in the inset; the collapse is fitted 
with a Weibull function with the shape parameter j3 = 1.37 
(black line). 



dx{\ 



P) 1 xfn- 



L ((l+p)- 1 x),(5) 



which is consistent with the fact that the distribution 
in the long-time limit can be described by a time- 
independent stationary function f(x) with the rescaled 
size x = xjr\ and the scale parameter r\ = a(l+p) n . The 
scale factor a introduced here depends in particular on 
the replication probability p via boundary conditions, as 
discussed later. 



Finally, a quantity of interest is given by the Laplace 
transform j(A) = L°° dx e~ Xx f{x), for which the recur- 
sive relation in Eq. $3]) reads [12| 



f((l +p )X) = qf(X)+pf(X) 2 



(0) 



Equation takes the form of a Poincare-type equation 
|16| , which is directly related in property to Mahler func- 
tional equations [ 1 7| via an appropriate change of vari- 
ables {18j |. 

In the limit of small p where cells replicate very rarely, 
one may expand Eq. © as f((l+p)X) « /(A) +pXf'(X), 
to obtain the differential equation: 



A/'(A) = /(A) 2 - /(A) 



(7) 



with the initial conditions /(0) = 1 and /'(0) = —a 1 . 
The solution reads /(A) = a(X + a) -1 , the inverse 
Laplace transform of which is given by the simple ex- 
ponential function f(x) = aexp(— ax). With the con- 
straint F(l) = 1 — e _1 on the cumulative distribution 
F(x) = Jq dx' f(x'), we obtain the scaling factor a = 1 
and therefore f(x) = cxp(— x). In the opposite case of 
p = 1 where every cell replicates, we have f(2X) = /(A) 2 , 
with the simple solution satisfying the initial conditions 
given by /(A) = exp(— A/a). This leads to the Dirac delta 
distribution f(x) = S(x — a^ 1 ) and the Heaviside cumu- 
lative distribution F(x) = 9(x — a~ l ). The constraint on 
F (1) again imposes a = 1. 



III. SCALING OF THE SIZE DISTRIBUTION 

In this section, we consider the general case of < 
p < 1. As for the unique stationary distribution f(x) for 
given p, one may question whether there exists any rela- 
tion between the distribution f{x) corresponding to two 
different replication probabilities p and po, respectively. 
Since the final stationary distributions result from the 
same branching process, albeit with different branching 
probabilities, they are expected to share qualitatively the 
same properties. 

To probe the scaling of the tissue size in the replication 
process, we display in Fig. @]the cumulative distribution 
for the replication probability p = 0.1, 0.3, and 0.5. Note 
that the scale factor a in the rescaling of the size has 
been adjusted to satisfy the condition F(i=l) = 1 — e _1 . 
To probe the functional relations between the cumula- 
tive distributions for different values of p under the con- 
straints for F, we consider the change of variable x — > x@ , 
as the simplest possibility, where the exponent j3 = fi{p) 
is then adjusted to make all curves for considered values 
of p collapse onto a single curve. This ansatz indeed leads 
to the collapse of different cumulative distributions into 
a unique distribution Fq(x) = 1 — e~ x , as shown in the 
inset. Therefore the new variable x@ determines the func- 
tional form of F(x), at least for the numerical cases con- 
sidered. Indeed, using the known result F (J) = 1 — e~ x 
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FIG. 4: (color online) Cumulative distribution for p = 0.1, 0.3, 
and 0.5. The rescaled size is given by x = a~ (1 +p)~ n l with 
n = 20, where a is the scale factor to adjust F(l) = 1 — e _1 . 
Rescaled cumulative distribution functions are plotted in the 

inset, disclosing the collapse into the function F(x) = 1 — e~ x 
(black line). 

in the limit p — >• 0, wc obtain F(x) = 1 — e~ x<3 with 
a good precision for p > 0, which leads to the Weibull 
distribution. 

The ansatz of the scaling x$ can be justified from the 
exact series expansion of the distribution f{x). Multiply- 
ing both sides of Eq. (|6|) by exp(— iXx), performing the 
rotation A — > iA, and integrating over A along the real 
axis, we obtain 

-i-ZKl+p)-^) = qf(x)+p f dx'f(x')f(x-x'). (8) 
1 +P J a 

It can be shown that f(x) admits a series expan- 
sion in powers of x consistent with the previous re- 
lation. In particular, f{x) vanishes at the origin as 
f(x) k aoi^ -1 , with some constant ao and exponent 
/3 = -[log(l+p)] _1 log(l-i>) > 1 [3. Here this anal- 
ysis can be extended to consecutive terms to yield the 
following expansion 

f(x)=x f> - 1 ^2a k x kf> , (9) 

k>0 

where identifying the powers in Eq. (|5]) gives the recur- 
sion relation for the coefficients: 

fc-i 

(q k+1 - q)a k = p ^ B(p(l+l),0(k-t))at a fc _!_ ; (10) 
1=0 

with the beta function B{x,y) = dtt x - l (l - i)* -1 . 
Here ao is the only unknown parameter depending on 
boundary conditions, since Eq. (|10l) implies the propor- 
tionality relation a k oc aj +fc . 

^From these results, it is easy to see that f(x) can be 
cast into the form 

f{x) = a i' 3 - 1 -F(aoi /3 ) (H) 



with the unique regular expansion of the scaling function: 
= Tlk>o ®kX k , where a k satisfies the relation in Eq. 
(|10P but with the initial term ao = 1; this determines 
uniquely all the other coefficients a k for k > 1. The cu- 
mulative distribution F(x) is equal to a scaling function 
of the variable a^xP alone since 

P k>0 

where Q is, like J 7 , uniquely defined by the coefficients 
Ofe. The parameter ao is defined according to the con- 
straint F(l) = 1 — e _1 , and can be related to a via the 
equation for the first moment J dxxf{x) = a -1 . This 

relation simply gives ao = a^ [J °° u 1 ^7 r (u)dw] . Note 
that the cumulative distribution F is a function of the 
variable .t' 3 up to a scaling factor, which is also true for 
the Weibull distribution, F(x^) = 1 — cxp(— x@) with 
x = x/rj. In the limit of small p, /3 is close to unity and 
one can show that the expansion coefficients satisfying 
Eq. (fit)]) are approximatively given by a k = {—l) k /k\. 
Therefore Q{a^x^) wl - exp(— aoi^) is indeed close to 
the Weibull distribution. 

The previous results show that the distribution can 
be expanded as a series and vanishes as a power law 
with the exponent /3— 1 related to the replication prob- 
ability p. In the opposite case of large x, the inte- 
gral equation ((5J) can be analyzed. Since we expect 
f(x) to decrease with x and assume the stretched ex- 
ponential behavior: f(x) ~ cxp(— a^x^ ) with con- 
stant, we observe that in Eq. ([8]) the left-hand-side term 
/((l+p) _1 i) oc exp[— doo(l +p) ^ } is dominant over 
the first term f(x) on the right-hand side. The last term 
can be analyzed by means of the saddle point analysis for 
the function x 1 " + (x — x 1 ) 13 appearing in the exponen- 
tial contribution. The saddle point, obtained by taking 
the extremum of this quantity with respect to x' , corre- 
sponds to the middle point of the integration x' = x/2. 
The overall integral gives therefore a dominant contri- 
bution proportional to exp[— 2a oa (x/2)P }. The ansatz 
is consistent if the two coefficients satisfy the relation 
(1 + p)- p ' = 2 1 "' 3 '. This results in a new exponent 
0' = log 2 [log 2 — Iog(l+p)] _1 valid in the asymptotic 
limit; this was also obtained in Ref. (l2j . 



IV. DISCUSSION 

It has been shown that the replication process of cells 
with not too large replication probability (p < 0.5) gives 
rise to a distribution extremely close to the Weibull func- 
tion. The parameters of the Weibull distribution can 
then be related with the first two moments of the dis- 
tribution function f n {x): (1 + p) n = ^(l+fi^ 1 ) and 
2(1 +P) 2 ' 1 - 1 = ?7 2 r(l+2/3- 1 ), where T{x) is the Gamma 
function. This leads to the following relation between the 
replication probability p and the shape parameter j3 w of 
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FIG. 5: (color online) Relation between exponent /3 W of the 
Weibull distribution and the replication probability p of the 
branching process. The exponent f3 is also plotted for com- 
parison. 

the Weibull distribution: 

2 K TPw _ t (13 

r(i+2^ 1 ) v y 

which is exhibited in Fig. [5j In addition, the scale factor 
a in the rescaling parameter rj = a(l + p) n is given by 
a = T~ 1 (l+[3~ 1 ). Note that the exponents /? and 
are hardly distinguishable for p < 0.5, where the scaling 
function J- is asymptotically similar to an exponential. 
This suggests that the distribution in Eq. (fTTj) belongs to 
the Weibull class for small p. This regime applies to many 
cases in nature that a certain event such as replication 
or nucleation occurs with probability less than 50 % at 
a given time unit. On the other hand, the replication 
process with a large value of p results in a different type 
of distribution, e.g., a multimodal distribution (see Fig. 

n. 

In conclusion, the branching process provides a general 
mechanism of the Weibull distribution with j3 < 2, corre- 
sponding to the branching probability p < 0.5. We have 
also found that the branching process can be mapped 
into a process of aggregation of clusters. A recent exam- 
ple includes the protein aggregation process with fission, 
where the Weibull distribution with j3 ~ 2 emerges as a 
stationary solution [l9j . 
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Appendix: Moment expression 

The functional equation, given by Eq. for the 

Laplace transform of the size distribution can also be de- 
rived with the help of moments of the distribution. Here 
we briefly mention how to evaluate recursively all these 
moments starting from the generating function. ^From 
the relation 

(x) n = z^-g n {z)\ z= i = zg'i{g n -i{z))g' n -i{z)\ z= i (A.l) 
az 

with the initial condition <?q(1) = 1, the average number 
of elements is simply: 

(x} n = (l+p){x) n ^ = (l+p) n , 

whereas the second moment is given by 

{x 2 ) n =zg' n {z) + z 2 g^z)\ z=l . (A.2) 

To evaluate g"(l), we differentiate the recursion relation 
for the generating function and obtain 

9n+i( z ) = 9i(9n{z))g„{z) + g'i(g n (z))g'n(z), 
which leads to 

g : +1 (i) = 2p(i+pf n + (i+p) g :(i). 

Noting that g"(l) = 2p and g'o(l) = 0, we obtain the 
general solution of the previous recursion 

5 ;:(i) = 2[(i+ P ) 2 '- 1 -(i+ P )"- 1 ] 

and the second moment 

(x 2 ) n = 2(l+p) 2n - 1 -(l-p)(l+ P )"- 1 

m 2(1 +p) 2n -\ (A.3) 

In this large-n (i.e., long-time) limit, one may define 

the scaling relation (x k ) n ~ git\l) — hk{l +p) kn , where 
the first few coefficients read 

h = h 1 = l, h 2 = -^—. (A.4) 
l+p 

For the fcth moment (x k ) n , given by a sum of derivatives 
of g n , it is indeed sufficient to compute the largest (i.e., 
kth) derivative of g n , which gives the essential contribu- 
tion to the coefficient h^. 

A general method can be developed to evaluate the 
successive moments by computing the dominant part of 
the derivatives of g n (z) in the large-n limit. The k th 
derivative g^\z) satisfies indeed the following relation: 

g { nUz) = g'{(g n (z))T n!k (z) + g'Mz))gW(z) (A.5) 

with the initial conditions T„.i(z) = 0, T n ^{z) = g l2 (z), 
and T n ^(z) = 3g' n (z)g'^(z). Taking the derivative of Eq. 
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(IA.5|) with respect to z, we obtain the relation for the 
coefficient T n k{z): 



T ni k+i(z) = ^T nM {z)+g' n {z)g^(z). 

This can be solved by iterations 
fc-i 



T, 



n,fc+l 



(*) = E 



m— 



9z r 



(A.6) 



h — 1 m 



m=0 2=0 



EE K^'w^W'^ 



where it has been noticed that T n> k{z) contains at most 
the (/c— l)th derivative of g n {z). 

Since ^(1) = 1+p and 5" (1) = 2p, Eq. ([XT5]) . together 
with Eq. (|A.7|) . bears the solution for z = 1: 



n— 1 fc— 2 m n — 1 

' m 



&(!) = 2pE(! +^ T ^0) = E EE(!+rf 3 ( 7)^(1) fl^W. (A- 

j=0 m=0 Z=0 j=0 



In the large-n limit, we may use the scaling relation (1) = /ifc(l + p) fe ™, so that the dependency on n can be 
factorized, which leads to the non-linear recursive relation for h^: 



h 



k 



1 k \ rt fe— 2 m / 

(a; ) n 2p l m 



This equation, together with the low-order coefficients in Eq. (|A.4|) . gives all the successive coefficients by simple 
iterations. I 

^From the nonlinear relations in Eq. ()A.9|) . one can re- _ afc ^ a °^^ (All) 

construct directly the Laplace transform of the stationary ^-^ A' 3 ( fe+1 ) A/ 3 

distribution in Eq. 



/(A) 



/ fie^^E^, (A.10) 
Jo fe>0 



for which the functional equation can be obtained. 

In addition, Eq. © gives directly the exact large-A 
behavior of the Laplace transform /(A) (see also [IH), 
which can be written as 



/>oo />oo 

/(A) = / cEe- Ai /(iH]ra, / c 
Jo k>0 Jo 
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